It’s often the case that students and parents alike hold the belief that academic success in school is purely determined by the duration or quality of studying, when in reality, there are a plethora of other factors that can affect a student’s performance. Although it’s logical to assume that persistent and productive studying will always lead to better academic performance, it overlooks the fact that students’ personal lives and extracurricular activities also play a key part in their will or ability to study, along with their academic performance.
At this assertion, the next question that many may ask is “Then, what other factors affect academic performance?,” which I’ll be attempting to answer using a data set provided by UC Irvine’s Machine Learning Repository. During the 2005-2006 school year, in the Alentejo (Ah-len-TAY-zhoo) region of Portugal, researchers Paulo Cortez and Alicia Silva collected data from Gabriela Pereira School and Mousinho de Silveira School using school reports and questionnaires. Using their results, they authored a report in an attempt to develop “more efficient student prediction tools…[improve] the quality of education, and [enhance] school resource management” (Paulo, Silva).
Students at Gabriel Pereira School - AEGP
Using their compiled data, I will try to answer the following questions.
- How do the family circumstances of secondary students at Gabriel Pereira School and Mousinho de Silveira School affect their academic performance in math and Portuguese language classes?
- How do extracurricular and social activities of secondary students at Gabriel Pereira School and Mousinho de Silveira School relate to their academic performance?
After downloading the data set from the aforementioned UCI Machine Learning Repository and making initial observations, one of the earliest problems I faced was trying to import it properly. After some difficulty in understanding the file type and function, I discovered that it’d be best to read the provided data sets with “delim()” so that it could load properly and be ready for alteration and analysis.
Additionally, the biggest problems I faced resulted from a small note left by the authors stating that both data sets contained duplicate students. This was an issue since I planned on merging the data sets from the beginning to analyze the average of the variables. Fortunately, the authors provided a line of code that helped me determine what observational units were duplicates and how many of them there were, so I was able to use the “full_join()” function to combine the two data sets according to the given criteria. I then filtered for the needed variables but then experienced some more difficulty since there were 2 columns for 5/8 of the variables I was analyzing since they were not part of the defining criteria. Ultimately, I ended up consolidating the variables that had been affected by the original merge using the “coalesce()” function and then further filtered to organize the final data set.
## # A tibble: 7 × 4
## variable type desc. values
## <chr> <chr> <chr> <chr>
## 1 Pstatus Categorical parent's cohabitation status together/…
## 2 famsup Categorical family educational support Y/N
## 3 famrel Numeric quality of family relationships 1-5
## 4 activities Categorical extra-curricular activities Y/N
## 5 romantic Categorical in romantic relationship Y/N
## 6 goout Categorical frequency of going out with friends 1-5
## 7 G3_avg Numeric average of all grades in Portuguese and Math 1-20
First, before seeing how these variables interact with academic performance, I believe it’s important to at least view the distribution of the grades themselves.
The distribution of grades is very slightly left skewed as can been seen above. The median final grade is 11 and the IQR (Inter-Quartile Range), which represents the range of the center 50% of the data, is 4. This illustrates that most students are getting between grades of between 9 and 13 out of 20. On an American grade scale, this would be between 45 and 65 out of 100. Notably, a considerable amount of students have failed completely.
For this analysis, we’ll be considering the variables “pstatus”, “famsup”, “famrel”, and their relation to “G3” with specific emphasis on “famrel”. Many studies like this one from the National Library of Medicine have determined and underlined the efficacy of unhealthy family relationships in lowering academic performance due to a variety of psycho-physiological causes.
Based on the calculated data, there were 599 students whose parents live together and 83 students whose parents live separately.
Based on the calculated data, there were 414 students who received educational support from their family and 268 students who didn’t.
The distribution of grades is left skewed, as can been seen above. The median final grade is 4 and the IQR is 1. This shows that most students generally have positive relationships with their family members.
According to the graph, the median grade of students whose parents live apart is 11 and the IQR is 4. For students whose parents live together, the median grade is 11 and the IQR is 4. Unexpectedly, this shows that students’ average grades appear almost identical regardless of whether their parents lived together or apart.
According to the graph, the median grade of students who don’t receive educational support from their family is 11 and the IQR is 5. For students who do receive that support, the median grade is 11 and the IQR is 4. Again, unexpectedly, this shows that family educational support did not have much bearing on students’ grades, but strangely, students without educational support had slightly more positive variability.
According to the result of the graph, the grades of each student depending on how they rated their family relationships can be seen below.
## # A tibble: 5 × 3
## famrel median IQR
## <dbl> <dbl> <dbl>
## 1 1 10 4
## 2 2 11 5
## 3 3 11 4
## 4 4 11 5
## 5 5 11 5
This shows that grades are generally consistent across all family relationships. However, as can be seen in the graph and chart, there appears to be a slight positive correlation between better family relationships and higher academic performance.
After examining the data, it appears that family circumstances, such as whether parents live together, the presence of family educational support, and the quality of family relationships, have a relatively modest effect on students’ academic performance in math and Portuguese. Similarly, students’ engagement in extracurricular and social activities, including participation in after-school activities, romantic relationships, and the frequency of going out with friends, does not seem to substantially influence their final grades, suggesting that these personal and social factors may play a smaller role than commonly assumed.
However, I think it’s important to consider the scope and context of the data in this regard and I suggest that more information be collected across a variety of schools in different regions for the results to be more conclusive. In addition to that, I think it’d be interesting to further research why students who don’t receive educational support from their family have slightly higher grade.